Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 54
Filter
Add more filters










Publication year range
1.
J Proteome Res ; 2024 May 08.
Article in English | MEDLINE | ID: mdl-38717300

ABSTRACT

The availability of an increasingly large amount of public proteomics data sets presents an opportunity for performing combined analyses to generate comprehensive organism-wide protein expression maps across different organisms and biological conditions. Sus scrofa, a domestic pig, is a model organism relevant for food production and for human biomedical research. Here, we reanalyzed 14 public proteomics data sets from the PRIDE database coming from pig tissues to assess baseline (without any biological perturbation) protein abundance in 14 organs, encompassing a total of 20 healthy tissues from 128 samples. The analysis involved the quantification of protein abundance in 599 mass spectrometry runs. We compared protein expression patterns among different pig organs and examined the distribution of proteins across these organs. Then, we studied how protein abundances were compared across different data sets and studied the tissue specificity of the detected proteins. Of particular interest, we conducted a comparative analysis of protein expression between pig and human tissues, revealing a high degree of correlation in protein expression among orthologs, particularly in brain, kidney, heart, and liver samples. We have integrated the protein expression results into the Expression Atlas resource for easy access and visualization of the protein expression data individually or alongside gene expression data.

2.
Bioinform Adv ; 4(1): vbae048, 2024.
Article in English | MEDLINE | ID: mdl-38638280

ABSTRACT

Motivation: Cell-type deconvolution methods aim to infer cell composition from bulk transcriptomic data. The proliferation of developed methods coupled with inconsistent results obtained in many cases, highlights the pressing need for guidance in the selection of appropriate methods. Additionally, the growing accessibility of single-cell RNA sequencing datasets, often accompanied by bulk expression from related samples enable the benchmark of existing methods. Results: In this study, we conduct a comprehensive assessment of 31 methods, utilizing single-cell RNA-sequencing data from diverse human and mouse tissues. Employing various simulation scenarios, we reveal the efficacy of regression-based deconvolution methods, highlighting their sensitivity to reference choices. We investigate the impact of bulk-reference differences, incorporating variables such as sample, study and technology. We provide validation using a gold standard dataset from mononuclear cells and suggest a consensus prediction of proportions when ground truth is not available. We validated the consensus method on data from the stomach and studied its spillover effect. Importantly, we propose the use of the critical assessment of transcriptomic deconvolution (CATD) pipeline which encompasses functionalities for generating references and pseudo-bulks and running implemented deconvolution methods. CATD streamlines simultaneous deconvolution of numerous bulk samples, providing a practical solution for speeding up the evaluation of newly developed methods. Availability and implementation: https://github.com/Papatheodorou-Group/CATD_snakemake.

3.
Nucleic Acids Res ; 52(D1): D107-D114, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-37992296

ABSTRACT

Expression Atlas (www.ebi.ac.uk/gxa) and its newest counterpart the Single Cell Expression Atlas (www.ebi.ac.uk/gxa/sc) are EMBL-EBI's knowledgebases for gene and protein expression and localisation in bulk and at single cell level. These resources aim to allow users to investigate their expression in normal tissue (baseline) or in response to perturbations such as disease or changes to genotype (differential) across multiple species. Users are invited to search for genes or metadata terms across species or biological conditions in a standardised consistent interface. Alongside these data, new features in Single Cell Expression Atlas allow users to query metadata through our new cell type wheel search. At the experiment level data can be explored through two types of dimensionality reduction plots, t-distributed Stochastic Neighbor Embedding (tSNE) and Uniform Manifold Approximation and Projection (UMAP), overlaid with either clustering or metadata information to assist users' understanding. Data are also visualised as marker gene heatmaps identifying genes that help confer cluster identity. For some data, additional visualisations are available as interactive cell level anatomograms and cell type gene expression heatmaps.


Subject(s)
Databases, Genetic , Gene Expression Profiling , Proteomics , Genotype , Metadata , Single-Cell Analysis , Internet , Humans , Animals
4.
Front Cell Dev Biol ; 11: 1297910, 2023.
Article in English | MEDLINE | ID: mdl-38020918

ABSTRACT

Melanoma is the deadliest form of skin cancer and develops from the melanocytes that are responsible for the pigmentation of the skin. The skin is also a highly regenerative organ, harboring a pool of undifferentiated melanocyte stem cells that proliferate and differentiate into mature melanocytes during regenerative processes in the adult. Melanoma and melanocyte regeneration share remarkable cellular features, including activation of cell proliferation and migration. Yet, melanoma considerably differs from the regenerating melanocytes with respect to abnormal proliferation, invasive growth, and metastasis. Thus, it is likely that at the cellular level, melanoma resembles early stages of melanocyte regeneration with increased proliferation but separates from the later melanocyte regeneration stages due to reduced proliferation and enhanced differentiation. Here, by exploiting the zebrafish melanocytes that can efficiently regenerate and be induced to undergo malignant melanoma, we unravel the transcriptome profiles of the regenerating melanocytes during early and late regeneration and the melanocytic nevi and malignant melanoma. Our global comparison of the gene expression profiles of melanocyte regeneration and nevi/melanoma uncovers the opposite regulation of a substantial number of genes related to Wnt signaling and transforming growth factor beta (TGF-ß)/(bone morphogenetic protein) BMP signaling pathways between regeneration and cancer. Functional activation of canonical Wnt or TGF-ß/BMP pathways during melanocyte regeneration promoted melanocyte regeneration but potently suppressed the invasiveness, migration, and proliferation of human melanoma cells in vitro and in vivo. Therefore, the opposite regulation of signaling mechanisms between melanocyte regeneration and melanoma can be exploited to stop tumor growth and develop new anti-cancer therapies.

5.
Nat Commun ; 14(1): 6495, 2023 10 14.
Article in English | MEDLINE | ID: mdl-37838716

ABSTRACT

The growing number of available single-cell gene expression datasets from different species creates opportunities to explore evolutionary relationships between cell types across species. Cross-species integration of single-cell RNA-sequencing data has been particularly informative in this context. However, in order to do so robustly it is essential to have rigorous benchmarking and appropriate guidelines to ensure that integration results truly reflect biology. Here, we benchmark 28 combinations of gene homology mapping methods and data integration algorithms in a variety of biological settings. We examine the capability of each strategy to perform species-mixing of known homologous cell types and to preserve biological heterogeneity using 9 established metrics. We also develop a new biology conservation metric to address the maintenance of cell type distinguishability. Overall, scANVI, scVI and SeuratV4 methods achieve a balance between species-mixing and biology conservation. For evolutionarily distant species, including in-paralogs is beneficial. SAMap outperforms when integrating whole-body atlases between species with challenging gene homology annotation. We provide our freely available cross-species integration and assessment pipeline to help analyse new data and develop new algorithms.


Subject(s)
Algorithms , Benchmarking , Molecular Sequence Annotation , Exome Sequencing , Sequence Analysis, RNA/methods , Single-Cell Analysis/methods
6.
J Pathol Inform ; 14: 100328, 2023.
Article in English | MEDLINE | ID: mdl-37693862

ABSTRACT

Pathologists need to compare histopathological images of normal and diseased tissues between different samples, cases, and species. We have designed an interactive system, termed Comparative Pathology Workbench (CPW), which allows direct and dynamic comparison of images at a variety of magnifications, selected regions of interest, as well as the results of image analysis or other data analyses such as scRNA-seq. This allows pathologists to indicate key diagnostic features, with a mechanism to allow discussion threads amongst expert groups of pathologists and other disciplines. The data and associated discussions can be accessed online from anywhere in the world. The Comparative Pathology Workbench (CPW) is a web-browser-based visual analytics platform providing shared access to an interactive "spreadsheet" style presentation of image and associated analysis data. The CPW provides a grid layout of rows and columns so that images that correspond to matching data can be organised in the form of an image-enabled "spreadsheet". An individual workbench can be shared with other users with read-only or full edit access as required. In addition, each workbench element or the whole bench itself has an associated discussion thread to allow collaborative analysis and consensual interpretation of the data. The CPW is a Django-based web-application that hosts the workbench data, manages users, and user-preferences. All image data are hosted by other resource applications such as OMERO or the Digital Slide Archive. Further resources can be added as required. The discussion threads are managed using WordPress and include additional graphical and image data. The CPW has been developed to allow integration of image analysis outputs from systems such as QuPath or ImageJ. All software is open-source and available from a GitHub repository.

7.
J Clin Med ; 12(12)2023 Jun 07.
Article in English | MEDLINE | ID: mdl-37373578

ABSTRACT

Crohn's disease (CD) is a chronic inflammatory bowel disease with a high prevalence throughout the world. The development of Crohn's-related fibrosis, which leads to strictures in the gastrointestinal tract, presents a particular challenge and is associated with significant morbidity. There are currently no specific anti-fibrotic therapies available, and so treatment is aimed at managing the stricturing complications of fibrosis once it is established. This often requires invasive and repeated endoscopic or surgical intervention. The advent of single-cell sequencing has led to significant advances in our understanding of CD at a cellular level, and this has presented opportunities to develop new therapeutic agents with the aim of preventing or reversing fibrosis. In this paper, we discuss the current understanding of CD fibrosis pathogenesis, summarise current management strategies, and present the promise of single-cell sequencing as a tool for the development of effective anti-fibrotic therapies.

8.
J Pediatric Infect Dis Soc ; 12(6): 322-331, 2023 Jun 30.
Article in English | MEDLINE | ID: mdl-37255317

ABSTRACT

BACKGROUND: To identify a diagnostic blood transcriptomic signature that distinguishes multisystem inflammatory syndrome in children (MIS-C) from Kawasaki disease (KD), bacterial infections, and viral infections. METHODS: Children presenting with MIS-C to participating hospitals in the United Kingdom and the European Union between April 2020 and April 2021 were prospectively recruited. Whole-blood RNA Sequencing was performed, contrasting the transcriptomes of children with MIS-C (n = 38) to those from children with KD (n = 136), definite bacterial (DB; n = 188) and viral infections (DV; n = 138). Genes significantly differentially expressed (SDE) between MIS-C and comparator groups were identified. Feature selection was used to identify genes that optimally distinguish MIS-C from other diseases, which were subsequently translated into RT-qPCR assays and evaluated in an independent validation set comprising MIS-C (n = 37), KD (n = 19), DB (n = 56), DV (n = 43), and COVID-19 (n = 39). RESULTS: In the discovery set, 5696 genes were SDE between MIS-C and combined comparator disease groups. Five genes were identified as potential MIS-C diagnostic biomarkers (HSPBAP1, VPS37C, TGFB1, MX2, and TRBV11-2), achieving an AUC of 96.8% (95% CI: 94.6%-98.9%) in the discovery set, and were translated into RT-qPCR assays. The RT-qPCR 5-gene signature achieved an AUC of 93.2% (95% CI: 88.3%-97.7%) in the independent validation set when distinguishing MIS-C from KD, DB, and DV. CONCLUSIONS: MIS-C can be distinguished from KD, DB, and DV groups using a 5-gene blood RNA expression signature. The small number of genes in the signature and good performance in both discovery and validation sets should enable the development of a diagnostic test for MIS-C.


Subject(s)
COVID-19 , Mucocutaneous Lymph Node Syndrome , Child , Humans , COVID-19/diagnosis , COVID-19/genetics , Systemic Inflammatory Response Syndrome/diagnosis , Systemic Inflammatory Response Syndrome/genetics , Hospitals , Mucocutaneous Lymph Node Syndrome/diagnosis , Mucocutaneous Lymph Node Syndrome/genetics , COVID-19 Testing
9.
Nat Rev Gastroenterol Hepatol ; 20(9): 597-614, 2023 09.
Article in English | MEDLINE | ID: mdl-37258747

ABSTRACT

The number of studies investigating the human gastrointestinal tract using various single-cell profiling methods has increased substantially in the past few years. Although this increase provides a unique opportunity for the generation of the first comprehensive Human Gut Cell Atlas (HGCA), there remains a range of major challenges ahead. Above all, the ultimate success will largely depend on a structured and coordinated approach that aligns global efforts undertaken by a large number of research groups. In this Roadmap, we discuss a comprehensive forward-thinking direction for the generation of the HGCA on behalf of the Gut Biological Network of the Human Cell Atlas. Based on the consensus opinion of experts from across the globe, we outline the main requirements for the first complete HGCA by summarizing existing data sets and highlighting anatomical regions and/or tissues with limited coverage. We provide recommendations for future studies and discuss key methodologies and the importance of integrating the healthy gut atlas with related diseases and gut organoids. Importantly, we critically overview the computational tools available and provide recommendations to overcome key challenges.


Subject(s)
Gastrointestinal Tract , Organoids , Humans , Forecasting
10.
NAR Genom Bioinform ; 5(1): lqad014, 2023 Mar.
Article in English | MEDLINE | ID: mdl-36879900

ABSTRACT

Bulk transcriptomes are an essential data resource for understanding basic and disease biology. However, integrating information from different experiments remains challenging because of the batch effect generated by various technological and biological variations in the transcriptome. Numerous batch-correction methods to deal with this batch effect have been developed in the past. However, a user-friendly workflow to select the most appropriate batch-correction method for the given set of experiments is still missing. We present the SelectBCM tool that prioritizes the most appropriate batch-correction method for a given set of bulk transcriptomic experiments, improving biological clustering and gene differential expression analysis. We demonstrate the applicability of the SelectBCM tool on analyses of real data for two common diseases, rheumatoid arthritis and osteoarthritis, and one example to characterize a biological state, where we performed a meta-analysis of the macrophage activation state. The R package is available at https://github.com/ebi-gene-expression-group/selectBCM.

11.
BMC Med Inform Decis Mak ; 23(1): 36, 2023 02 15.
Article in English | MEDLINE | ID: mdl-36793076

ABSTRACT

BACKGROUND: The Human Cell Atlas resource will deliver single cell transcriptome data spatially organised in terms of gross anatomy, tissue location and with images of cellular histology. This will enable the application of bioinformatics analysis, machine learning and data mining revealing an atlas of cell types, sub-types, varying states and ultimately cellular changes related to disease conditions. To further develop the understanding of specific pathological and histopathological phenotypes with their spatial relationships and dependencies, a more sophisticated spatial descriptive framework is required to enable integration and analysis in spatial terms. METHODS: We describe a conceptual coordinate model for the Gut Cell Atlas (small and large intestines). Here, we focus on a Gut Linear Model (1-dimensional representation based on the centreline of the gut) that represents the location semantics as typically used by clinicians and pathologists when describing location in the gut. This knowledge representation is based on a set of standardised gut anatomy ontology terms describing regions in situ, such as ileum or transverse colon, and landmarks, such as ileo-caecal valve or hepatic flexure, together with relative or absolute distance measures. We show how locations in the 1D model can be mapped to and from points and regions in both a 2D model and 3D models, such as a patient's CT scan where the gut has been segmented. RESULTS: The outputs of this work include 1D, 2D and 3D models of the human gut, delivered through publicly accessible Json and image files. We also illustrate the mappings between models using a demonstrator tool that allows the user to explore the anatomical space of the gut. All data and software is fully open-source and available online. CONCLUSIONS: Small and large intestines have a natural "gut coordinate" system best represented as a 1D centreline through the gut tube, reflecting functional differences. Such a 1D centreline model with landmarks, visualised using viewer software allows interoperable translation to both a 2D anatomogram model and multiple 3D models of the intestines. This permits users to accurately locate samples for data comparison.


Subject(s)
Imaging, Three-Dimensional , Software , Humans , Imaging, Three-Dimensional/methods
12.
J Proteome Res ; 22(3): 729-742, 2023 03 03.
Article in English | MEDLINE | ID: mdl-36577097

ABSTRACT

The availability of proteomics datasets in the public domain, and in the PRIDE database, in particular, has increased dramatically in recent years. This unprecedented large-scale availability of data provides an opportunity for combined analyses of datasets to get organism-wide protein abundance data in a consistent manner. We have reanalyzed 24 public proteomics datasets from healthy human individuals to assess baseline protein abundance in 31 organs. We defined tissue as a distinct functional or structural region within an organ. Overall, the aggregated dataset contains 67 healthy tissues, corresponding to 3,119 mass spectrometry runs covering 498 samples from 489 individuals. We compared protein abundances between different organs and studied the distribution of proteins across these organs. We also compared the results with data generated in analogous studies. Additionally, we performed gene ontology and pathway-enrichment analyses to identify organ-specific enriched biological processes and pathways. As a key point, we have integrated the protein abundance results into the resource Expression Atlas, where they can be accessed and visualized either individually or together with gene expression data coming from transcriptomics datasets. We believe this is a good mechanism to make proteomics data more accessible for life scientists.


Subject(s)
Proteome , Proteomics , Humans , Proteome/analysis , Proteomics/methods , Gene Expression Profiling , Databases, Factual , Mass Spectrometry/methods , Databases, Protein
13.
Nucleic Acids Res ; 51(D1): D9-D17, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36477213

ABSTRACT

The European Molecular Biology Laboratory's European Bioinformatics Institute (EMBL-EBI) is one of the world's leading sources of public biomolecular data. Based at the Wellcome Genome Campus in Hinxton, UK, EMBL-EBI is one of six sites of the European Molecular Biology Laboratory (EMBL), Europe's only intergovernmental life sciences organisation. This overview summarises the status of services that EMBL-EBI data resources provide to scientific communities globally. The scale, openness, rich metadata and extensive curation of EMBL-EBI added-value databases makes them particularly well-suited as training sets for deep learning, machine learning and artificial intelligence applications, a selection of which are described here. The data resources at EMBL-EBI can catalyse such developments because they offer sustainable, high-quality data, collected in some cases over decades and made openly availability to any researcher, globally. Our aim is for EMBL-EBI data resources to keep providing the foundations for tools and research insights that transform fields across the life sciences.


Subject(s)
Artificial Intelligence , Computational Biology , Data Management , Databases, Factual , Genome , Internet
14.
Plant Physiol ; 191(1): 35-46, 2023 01 02.
Article in English | MEDLINE | ID: mdl-36200899

ABSTRACT

We review how a data infrastructure for the Plant Cell Atlas might be built using existing infrastructure and platforms. The Human Cell Atlas has developed an extensive infrastructure for human and mouse single cell data, while the European Bioinformatics Institute has developed a Single Cell Expression Atlas, that currently houses several plant data sets. We discuss issues related to appropriate ontologies for describing a plant single cell experiment. We imagine how such an infrastructure will enable biologists and data scientists to glean new insights into plant biology in the coming decades, as long as such data are made accessible to the community in an open manner.


Subject(s)
Computational Biology , Plant Cells , Animals , Humans , Mice , Plants/genetics
15.
Sci Data ; 9(1): 335, 2022 06 14.
Article in English | MEDLINE | ID: mdl-35701420

ABSTRACT

The number of mass spectrometry (MS)-based proteomics datasets in the public domain keeps increasing, particularly those generated by Data Independent Acquisition (DIA) approaches such as SWATH-MS. Unlike Data Dependent Acquisition datasets, the re-use of DIA datasets has been rather limited to date, despite its high potential, due to the technical challenges involved. We introduce a (re-)analysis pipeline for public SWATH-MS datasets which includes a combination of metadata annotation protocols, automated workflows for MS data analysis, statistical analysis, and the integration of the results into the Expression Atlas resource. Automation is orchestrated with Nextflow, using containerised open analysis software tools, rendering the pipeline readily available and reproducible. To demonstrate its utility, we reanalysed 10 public DIA datasets from the PRIDE database, comprising 1,278 SWATH-MS runs. The robustness of the analysis was evaluated, and the results compared to those obtained in the original publications. The final expression values were integrated into Expression Atlas, making SWATH-MS experiments more widely available and combining them with expression data originating from other proteomics and transcriptomics datasets.


Subject(s)
Proteomics , Software , Data Analysis , Databases, Protein , Datasets as Topic , Mass Spectrometry/methods , Proteomics/methods
16.
PLoS Comput Biol ; 18(6): e1010174, 2022 06.
Article in English | MEDLINE | ID: mdl-35714157

ABSTRACT

The increasingly large amount of proteomics data in the public domain enables, among other applications, the combined analyses of datasets to create comparative protein expression maps covering different organisms and different biological conditions. Here we have reanalysed public proteomics datasets from mouse and rat tissues (14 and 9 datasets, respectively), to assess baseline protein abundance. Overall, the aggregated dataset contained 23 individual datasets, including a total of 211 samples coming from 34 different tissues across 14 organs, comprising 9 mouse and 3 rat strains, respectively. In all cases, we studied the distribution of canonical proteins between the different organs. The number of canonical proteins per dataset ranged from 273 (tendon) and 9,715 (liver) in mouse, and from 101 (tendon) and 6,130 (kidney) in rat. Then, we studied how protein abundances compared across different datasets and organs for both species. As a key point we carried out a comparative analysis of protein expression between mouse, rat and human tissues. We observed a high level of correlation of protein expression among orthologs between all three species in brain, kidney, heart and liver samples, whereas the correlation of protein expression was generally slightly lower between organs within the same species. Protein expression results have been integrated into the resource Expression Atlas for widespread dissemination.


Subject(s)
Proteins , Proteomics , Animals , Brain/metabolism , Mice , Proteins/metabolism , Rats
17.
Front Cell Dev Biol ; 10: 813314, 2022.
Article in English | MEDLINE | ID: mdl-35223842

ABSTRACT

Gliomas are the most frequent type of brain cancers and characterized by continuous proliferation, inflammation, angiogenesis, invasion and dedifferentiation, which are also among the initiator and sustaining factors of brain regeneration during restoration of tissue integrity and function. Thus, brain regeneration and brain cancer should share more molecular mechanisms at early stages of regeneration where cell proliferation dominates. However, the mechanisms could diverge later when the regenerative response terminates, while cancer cells sustain proliferation. To test this hypothesis, we exploited the adult zebrafish that, in contrast to the mammals, can efficiently regenerate the brain in response to injury. By comparing transcriptome profiles of the regenerating zebrafish telencephalon at its three different stages, i.e., 1 day post-lesion (dpl)-early wound healing stage, 3 dpl-early proliferative stage and 14 dpl-differentiation stage, to those of two brain cancers, i.e., low-grade glioma (LGG) and glioblastoma (GBM), we reveal the common and distinct molecular mechanisms of brain regeneration and brain cancer. While the transcriptomes of 1 dpl and 3 dpl harbor unique gene modules and gene expression profiles that are more divergent from the control, the transcriptome of 14 dpl converges to that of the control. Next, by functional analysis of the transcriptomes of brain regeneration stages to LGG and GBM, we reveal the common and distinct molecular pathways in regeneration and cancer. 1 dpl and LGG and GBM resemble with regard to signaling pathways related to metabolism and neurogenesis, while 3 dpl and LGG and GBM share pathways that control cell proliferation and differentiation. On the other hand, 14 dpl and LGG and GBM converge with respect to developmental and morphogenetic processes. Finally, our global comparison of gene expression profiles of three brain regeneration stages, LGG and GBM exhibit that 1 dpl is the most similar stage to LGG and GBM while 14 dpl is the most distant stage to both brain cancers. Therefore, early convergence and later divergence of brain regeneration and brain cancer constitutes a key starting point in comparative understanding of cellular and molecular events between the two phenomena and development of relevant targeted therapies for brain cancers.

18.
Eur Respir J ; 60(2)2022 08.
Article in English | MEDLINE | ID: mdl-35086829

ABSTRACT

The Human Cell Atlas (HCA) consortium aims to establish an atlas of all organs in the healthy human body at single-cell resolution to increase our understanding of basic biological processes that govern development, physiology and anatomy, and to accelerate diagnosis and treatment of disease. The Lung Biological Network of the HCA aims to generate the Human Lung Cell Atlas as a reference for the cellular repertoire, molecular cell states and phenotypes, and cell-cell interactions that characterise normal lung homeostasis in healthy lung tissue. Such a reference atlas of the healthy human lung will facilitate mapping the changes in the cellular landscape in disease. The discovAIR project is one of six pilot actions for the HCA funded by the European Commission in the context of the H2020 framework programme. discovAIR aims to establish the first draft of an integrated Human Lung Cell Atlas, combining single-cell transcriptional and epigenetic profiling with spatially resolving techniques on matched tissue samples, as well as including a number of chronic and infectious diseases of the lung. The integrated Human Lung Cell Atlas will be available as a resource for the wider respiratory community, including basic and translational scientists, clinical medicine, and the private sector, as well as for patients with lung disease and the interested lay public. We anticipate that the Human Lung Cell Atlas will be the founding stone for a more detailed understanding of the pathogenesis of lung diseases, guiding the design of novel diagnostics and preventive or curative interventions.


Subject(s)
Lung Diseases , Lung , Humans , Proteomics , Thorax
19.
Nucleic Acids Res ; 50(D1): D129-D140, 2022 01 07.
Article in English | MEDLINE | ID: mdl-34850121

ABSTRACT

The EMBL-EBI Expression Atlas is an added value knowledge base that enables researchers to answer the question of where (tissue, organism part, developmental stage, cell type) and under which conditions (disease, treatment, gender, etc) a gene or protein of interest is expressed. Expression Atlas brings together data from >4500 expression studies from >65 different species, across different conditions and tissues. It makes these data freely available in an easy to visualise form, after expert curation to accurately represent the intended experimental design, re-analysed via standardised pipelines that rely on open-source community developed tools. Each study's metadata are annotated using ontologies. The data are re-analyzed with the aim of reproducing the original conclusions of the underlying experiments. Expression Atlas is currently divided into Bulk Expression Atlas and Single Cell Expression Atlas. Expression Atlas contains data from differential studies (microarray and bulk RNA-Seq) and baseline studies (bulk RNA-Seq and proteomics), whereas Single Cell Expression Atlas is currently dedicated to Single Cell RNA-Sequencing (scRNA-Seq) studies. The resource has been in continuous development since 2009 and it is available at https://www.ebi.ac.uk/gxa.


Subject(s)
Databases, Genetic , Proteins/genetics , Proteomics , Software , Computational Biology , Gene Expression Profiling , Humans , Proteins/chemistry , RNA-Seq , Sequence Analysis, RNA , Single-Cell Analysis
20.
Nat Commun ; 12(1): 5854, 2021 10 06.
Article in English | MEDLINE | ID: mdl-34615866

ABSTRACT

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.


Subject(s)
Data Analysis , Databases, Protein , Metadata , Proteomics , Big Data , Humans , Reproducibility of Results , Software , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL
...